fix(workspace): enforce isolation policy, surface degraded mode in TUI by windoliver · Pull Request #249 · windoliver/grove

windoliver · 2026-04-13T00:42:15Z

Fixes #208.

Summary

Workspace provisioning failures in SessionOrchestrator.spawnAgent() and SpawnManager.spawn() were silently swallowed — agents ran in the shared project root with no error surfaced
Replaced blanket catch with policy-aware WorkspaceIsolationPolicy (strict | allow-fallback) and WorkspaceMode tagged union (isolated_worktree | fallback_workspace | bootstrap_failed)
TUI spawn-progress screen now shows a [shared workspace] / [no config] badge + inline ⚠ <reason> for degraded agents

Changes

Core

workspace-provisioner: fully async via execFileAsync (no event-loop blocking, no shell injection); parallel cleanup with Promise.allSettled; WorkspaceMode + WorkspaceIsolationPolicy exported from core/index
session-orchestrator: new provisionAgentWorkspace() helper with exhaustive policy-aware control flow; workspaceMode on AgentSessionInfo; default policy strict
session-service: threads workspaceIsolationPolicy through to orchestrator

TUI

spawn-manager: replaces inline execSync git worktree add with provisionWorkspace(); setIsolationPolicy(); workspaceMode on SpawnResult; default allow-fallback (backward compat)
spawn-progress: [shared workspace] / [no config] badge + ⚠ reason line for degraded agents
screen-manager: threads workspaceMode from SpawnResult into AgentSpawnState

Tests

7 new workspace-bootstrap unit tests
5 new policy enforcement tests in session-orchestrator
2 new policy enforcement tests in spawn-manager
All existing spawn-success tests retrofitted with workspaceMode assertions
3 tmux capture-pane E2E tests verifying on-screen badge visibility:
- allow-fallback + non-git dir → ● started [shared workspace] + ⚠ Command failed: git worktree add ...
- strict + non-git dir → ✗ failed: Workspace provisioning failed for '...'
- allow-fallback + real git repo → ● started (no badge — happy path)

Full suite: 4898 pass, 0 fail.

Test plan

Run bun test — 4898 pass
Run bun test tests/tui/workspace-isolation-e2e.test.ts — 3 pass (requires tmux)
Start TUI against a non-git directory with default policy — agents show [shared workspace] badge
Start TUI against a real grove with git — agents show plain ● started

Fixes #208. Workspace provisioning failures in SessionOrchestrator and SpawnManager were silently swallowed, causing agents to run in a shared project root with no operator visibility. Changes: - workspace-provisioner: fully async via execFileAsync (no event-loop block, no shell injection); parallel cleanup with Promise.allSettled - WorkspaceMode tagged union (isolated_worktree | fallback_workspace | bootstrap_failed) + WorkspaceIsolationPolicy (strict | allow-fallback) exported from core/index - session-orchestrator: provisionAgentWorkspace() helper with exhaustive policy-aware control flow; workspaceMode on AgentSessionInfo; default policy strict - spawn-manager: provisionWorkspace() replaces inline execSync git worktree; setIsolationPolicy(); workspaceMode on SpawnResult; default policy allow-fallback (TUI backward compat) - session-service: threads workspaceIsolationPolicy through to orchestrator - spawn-progress: [shared workspace] / [no config] badge + ⚠ reason line for degraded agents; screen-manager threads workspaceMode from SpawnResult - Tests: workspace-bootstrap unit tests (7), policy enforcement tests in orchestrator (5) and spawn-manager (2), retrofitted all existing spawn tests with workspaceMode assertions - E2E: tmux capture-pane test (3 scenarios) verifies badge visible on screen for fallback_workspace, hard failure for strict, clean for isolated_worktree

delegates/feeds/escalates edges now cause the target role's worktree to branch off the source role's grove branch instead of HEAD. coder → reviewer (delegates) coder worktree: grove/<sessionId>/coder (base: HEAD) reviewer worktree: grove/<sessionId>/reviewer (base: grove/<sessionId>/coder) This gives reviewers a real git branch relationship — they can run `git merge grove/<sessionId>/coder` to get the coder's latest commits without touching the main checkout. Other edge types (reports, requests, feedback) remain independent (HEAD). Changes: - topology: WORKSPACE_BRANCH_EDGES constant, resolveRoleWorkspaceStrategies(), topologicalSortRoles() — Kahn's algorithm ensures source branches exist before dependents try to base their worktrees on them - session-orchestrator: provision workspaces in topological order then spawn in parallel; passes resolved baseBranch to provisionAgentWorkspace() - grove-md-builder: annotates edge_type lines with workspace behaviour comment so GROVE.md is self-documenting - sqlite: worktree_strategy_json column on sessions table (migration + DDL); stores resolved strategies at creation time for operator visibility - session: worktreeStrategies field on Session type - tests: 13 new unit tests for resolveRoleWorkspaceStrategies and topologicalSortRoles covering all 6 edge types + chain ordering

SpawnManager now uses resolveRoleWorkspaceStrategies() to pick the correct baseBranch per role based on topology edge types. ScreenManager passes the resolved topology to SpawnManager before spawning begins. Also fixes the branch-naming mismatch: spawn() now uses wsSessionId (stable session-level ID) for provisionWorkspace so the branch computed by resolveRoleWorkspaceStrategies matches the branch actually created for the source role. Validated via 3 tmux capture-pane scenarios: 1. delegates + real git → both isolated_worktree, no badge 2. feedback + real git → both isolated_worktree, independent 3. delegates + no git → both fallback_workspace, [shared workspace] badge

The TUI screen-manager fired all role spawns concurrently, racing coder and reviewer worktree provisioning. With delegates edges the reviewer needs coder's git branch to exist first — concurrent spawns caused reviewer to fail with "git worktree add ... branch not found" and fall back to shared workspace. Fix: import topologicalSortRoles and spawn sequentially — source roles complete before dependents start. Validated via real TUI E2E: 1. grove up → New Session → review-loop → goal → launch 2. Spawn-progress shows: coder spawning → coder started → reviewer spawning → reviewer started 3. git worktree list confirms both worktrees on correct branches 4. Reviewer branch based on coder branch (delegates edge)

SessionOrchestrator hardcoded mcpServePath as join(projectRoot, "src/mcp/serve.ts") which only works when the project IS the grove repo. For any other project (e.g. /tmp/foo), agents got a non-existent MCP server path and couldn't call grove_submit_work, grove_submit_review, or grove_done. Fix: extract resolveMcpServePath() into shared utility that derives the path from process.argv[1] (the running grove CLI entry point), climbing 3 levels to find the grove installation directory. Falls back through dist/mcp/serve.js → src/mcp/serve.ts → import.meta.url variants. DRYs up SpawnManager which had the same logic inline in two places. Validated E2E: real claude agent via acpx in /tmp test project successfully called grove_submit_work — contribution CID stored in session_contributions with the correct session link.

…from edge type Edge type describes communication semantics. Workspace strategy describes file isolation. These are independent concerns — a reviewer might want a delegates edge but still work from HEAD. Before: WORKSPACE_BRANCH_EDGES constant implicitly derived workspace strategy from edge type (delegates/feeds/escalates → branch_from_source). No user control, surprise behavior when adding edges. After: explicit opt-in field on each edge: edges: - target: reviewer edge_type: delegates workspace: branch_from_source # optional, default: independent - Removed WORKSPACE_BRANCH_EDGES constant - resolveRoleWorkspaceStrategies checks edge.workspace, not edge type - topologicalSortRoles checks edge.workspace for ordering - Zod schema validates "branch_from_source" | "independent" - Wire conversion passes workspace through - review-loop preset updated with explicit workspace: branch_from_source - Reverted grove-md-builder YAML comments (GROVE.md unchanged) - grove-md-builder renders workspace field when present

…ediately grove session start exited right after spawning agents — the reviewer never got a chance to receive routed events from the coder's contribution. Fix: SessionOrchestrator.waitForCompletion() polls agent status every 3s and resolves when all agents are idle/stopped or timeout (5min) expires. CLI awaits this before marking session complete and closing DB. Validated: session now reaches "All agents idle — session complete" with coder submitting work via grove_submit_work. Reviewer receives routed events but doesn't call grove_submit_review (agent prompt issue, not infrastructure).

Three bugs prevented the review-loop handoff from completing: 1. Missing .acpxrc.json in orchestrator path bootstrapWorkspace only wrote .mcp.json, but acpx reads .acpxrc.json. Without it, agents had zero MCP tools — couldn't call grove_submit_work or grove_submit_review. Fix: write .acpxrc.json alongside .mcp.json. 2. Duplicate prompts orchestrator.start() sent goals via runtime.send() AFTER spawn() already sent them as initial prompt. Reviewer got the same prompt twice. Fix: remove the redundant send loop — spawn passes goals via AgentConfig.goal. 3. Premature session stop checkAllIdle() stopped the session when both agents went idle, even if no contributions existed yet. Coder goes idle briefly between editing and calling grove_submit_work. Fix: 30s grace period — don't auto-stop until at least one contribution exists or 30s have passed. Validated E2E with real claude agents via acpx: coder → work: "Added greet(name) function" reviewer → review: "LGTM — greet(name) correctly returns..." reviewer → discussion: "[DONE] Approved — implementation is correct"

Edge types simplified to 3 core types: delegates — forward work (A produces, B acts on it) feedback — response (B sends results back to A) monitors — observe-only (A watches B) Legacy aliases (reports, feeds, requests, escalates) kept in schema for backward compat but all presets migrated to use core types only. Routing message now includes source branch and merge instructions so the receiving agent can actually see file changes: "To see the actual file changes, run: git merge grove/<session>/<source>" CLAUDE.md updated with workflow steps: 1. git merge <source_branch> to get their files 2. Review the code 3. Call grove_submit_review with targetCid from notification

Issue A — git commit as work product: grove_submit_work now accepts commitHash (git SHA) instead of requiring CAS-per-file artifacts. Agents just commit and pass the hash. Other agents git fetch + git diff to see changes. Old artifacts path still works for backward compat. Changed: models.ts (commitHash field), contributions.ts (MCP schema), contribute.ts (skip CAS validation when commitHash present), acpx-runtime.ts (system-reminder uses git workflow), workspace-bootstrap.ts (CLAUDE.md teaches git-based workflow) Issue B — broadcast topology mode: Roles can set mode: "broadcast" to notify all other roles without explicit edges. No more N×(N-1) edge definitions for swarms. review-loop preset: coder and reviewer both use mode: "broadcast" instead of explicit delegates/feedback edges. Changed: topology.ts (mode field on role schema + wire type), topology-router.ts (broadcast routes to all roles), grove-md-builder.ts (renders mode), review-loop.ts preset Edge types (delegates/feedback/monitors) kept in schema for backward compat but no longer required when using broadcast mode.

…oundary MCP tools run in child processes (spawned by acpx/claude) with separate EventBus instances. Events published from grove_submit_work in the MCP process never reached the orchestrator's EventBus in the parent process. Fix: SessionOrchestrator polls the SQLite contribution store every 3s, detects new CIDs via a seen-set, and forwards them to downstream agents via runtime.send(). This is the same pattern the TUI uses via SpawnManager.startContributionPolling(). Validated: reviewer log shows the routed contribution message arriving with CID, summary, and source branch — the polling successfully bridges the process boundary.

…gent turns Contributions from fast agents (coder) were arriving in the same acpx session turn as the reviewer's initial prompt — the agent treated them as context instead of an action trigger and went idle. Fix: seed seenCids with pre-existing contributions, delay first poll by 15s so agents process their initial prompt first. Skip EventBus-based contribution forwarding entirely (polling handles it, EventBus can't cross process boundaries anyway). Validated E2E — full coder→reviewer handoff: work(coder): Added greet(name) function review(reviewer): LGTM — clean implementation work(coder): Added greet(name) function (iteration) discussion(reviewer): [DONE] Approved — implementation meets standards

Reviewer now gets the source agent's workspace path in the notification instead of a git branch name. No git merge needed — the reviewer reads files directly from the source workspace path: [grove] New work from coder: Workspace: /tmp/project/.grove/workspaces/coder-abc/ ACTION REQUIRED: Read the source files at /tmp/.../coder-abc/ The filesystem IS the handoff artifact. Both agents are on the same machine — the reviewer just reads the coder's directory. Updated CLAUDE.md and acpx system-reminder to teach agents the path-based workflow instead of git merge.

pollContributions now checks for [DONE] prefix or context.done=true in contributions. When detected, stops the session with a descriptive reason. This mirrors use-done-detection.ts from the TUI layer. Combined with the CLI's markDone(), the session status in SQLite transitions to "completed" with the proper stop reason.

Coder prompt: git commit → git rev-parse HEAD → grove_submit_work({ commitHash }) Reviewer prompt: read files at Workspace path from notification, not summary Validated E2E — full coder→reviewer handoff with real file inspection: 1. Coder edits app.js, commits, submits work 2. Polling routes notification with workspace path to reviewer 3. Reviewer reads coder's actual files (Read /path/to/coder/app.js) 4. Reviewer submits review with real code assessment 5. Reviewer calls grove_done → session status=completed 6. Project root untouched, worktrees isolated

windoliver added 17 commits April 12, 2026 17:41

style: fix biome lint errors in test/harness files

7f498bd

style: fix pre-existing biome lint in nexus-lifecycle.test.ts

c28628f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(workspace): enforce isolation policy, surface degraded mode in TUI#249

fix(workspace): enforce isolation policy, surface degraded mode in TUI#249
windoliver wants to merge 17 commits intomainfrom
worktree-temporal-whistling-sundae

windoliver commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

windoliver commented Apr 13, 2026

Summary

Changes

Tests

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant